27 research outputs found
ELVIS: Entertainment-led video summaries
© ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Multimedia Computing, Communications, and Applications, 6(3): Article no. 17 (2010) http://doi.acm.org/10.1145/1823746.1823751Video summaries present the user with a condensed and succinct representation of the content of a video stream. Usually this is achieved by attaching degrees of importance to low-level image, audio and text features. However, video content elicits strong and measurable physiological responses in the user, which are potentially rich indicators of what video content is memorable to or emotionally engaging for an individual user. This article proposes a technique that exploits such physiological responses to a given video stream by a given user to produce Entertainment-Led VIdeo Summaries (ELVIS). ELVIS is made up of five analysis phases which correspond to the analyses of five physiological response measures: electro-dermal response (EDR), heart rate (HR), blood volume pulse (BVP), respiration rate (RR), and respiration amplitude (RA). Through these analyses, the temporal locations of the most entertaining video subsegments, as they occur within the video stream as a whole, are automatically identified. The effectiveness of the ELVIS technique is verified through a statistical analysis of data collected during a set of user trials. Our results show that ELVIS is more consistent than RANDOM, EDR, HR, BVP, RR and RA selections in identifying the most entertaining video subsegments for content in the comedy, horror/comedy, and horror genres. Subjective user reports also reveal that ELVIS video summaries are comparatively easy to understand, enjoyable, and informative
Video Summarization Using Deep Semantic Features
Computer Vision - ACCV 2016: 13th Asian Conference on Computer Vision, Nov 20-24, 2016, Taipei, TaiwanThis paper presents a video summarization technique for an Internet video to provide a quick way to overview its content. This is a challenging problem because finding important or informative parts of the original video requires to understand its content. Furthermore the content of Internet videos is very diverse, ranging from home videos to documentaries, which makes video summarization much more tough as prior knowledge is almost not available. To tackle this problem, we propose to use deep video features that can encode various levels of content semantics, including objects, actions, and scenes, improving the efficiency of standard video summarization techniques. For this, we design a deep neural network that maps videos as well as descriptions to a common semantic space and jointly trained it with associated pairs of videos and descriptions. To generate a video summary, we extract the deep features from each segment of the original video and apply a clustering-based summarization technique to them. We evaluate our video summaries using the SumMe dataset as well as baseline approaches. The results demonstrated the advantages of incorporating our deep semantic features in a video summarization technique
Video summarization using deep semantic features
Abstract
This paper presents a video summarization technique for an Internet video to provide a quick way to overview its content. This is a challenging problem because finding important or informative parts of the original video requires to understand its content. Furthermore the content of Internet videos is very diverse, ranging from home videos to documentaries, which makes video summarization much more tough as prior knowledge is almost not available. To tackle this problem, we propose to use deep video features that can encode various levels of content semantics, including objects, actions, and scenes, improving the efficiency of standard video summarization techniques. For this, we design a deep neural network that maps videos as well as descriptions to a common semantic space and jointly trained it with associated pairs of videos and descriptions. To generate a video summary, we extract the deep features from each segment of the original video and apply a clustering-based summarization technique to them. We evaluate our video summaries using the SumMe dataset as well as baseline approaches. The results demonstrated the advantages of incorporating our deep semantic features in a video summarization technique
A scalable and extensible segment-event-object-based sports video retrieval system
Sport video data is growing rapidly as a result of the maturing digital technologies that support digital video capture, faster data processing, and large storage. However, (1) semi-automatic content extraction and annotation, (2) scalable indexing model, and (3) effective retrieval and browsing, still pose the most challenging problems for maximizing the usage of large video databases. This article will present the findings from a comprehensive work that proposes a scalable and extensible sports video retrieval system with two major contributions in the area of sports video indexing and retrieval. The first contribution is a new sports video indexing model that utilizes semi-schema-based indexing scheme on top of an Object-Relationship approach. This indexing model is scalable and extensible as it enables gradual index construction which is supported by ongoing development of future content extraction algorithms. The second contribution is a set of novel queries which are based on XQuery to generate dynamic and user-oriented summaries and event structures. The proposed sports video retrieval system has been fully implemented and populated with soccer, tennis, swimming, and diving video. The system has been evaluated against 20 users to demonstrate and confirm its feasibility and benefits. The experimental sports genres were specifically selected to represent the four main categories of sports domain: period-, set-point-, time (race)-, and performance- based sports. Thus, the proposed system should be generic and robust for all types of sports